25 research outputs found
Unsupervised ensemble minority clustering
Cluster a alysis lies at the core of most unsupervised learning tasks. However, the majority of clustering algorithms depend on the all-in assumption, in which all objects belong to some cluster, and perform poorly on minority clustering tasks, in which a small fraction of signal data stands against a majority of noise.
The approaches proposed so far for minority clustering are supervised: they require the
number and distribution of the foreground and background clusters. In supervised learning and all-in clustering, combination methods have been successfully applied to obtain distribution-free learners, even from the output of weak individual algorithms.
In this report, we present a novel ensemble minority clustering algorithm, Ewocs, suitable for weak clustering combination, and provide a theoretical proof of its properties under a loose set of constraints. The validity of the assumptions used in the proof is empirically assessed using a collection of synthetic datasets.Preprin
Non-parametric document clustering by ensemble methods
Los sesgos de los algoritmos individuales para clustering no paramétrico
de documentos pueden conducir a soluciones no óptimas. Los métodos de consenso
podrĂan compensar esta limitaciĂłn, pero no han sido probados sobre colecciones de
documentos. Este artĂculo presenta una comparaciĂłn de estrategias para clustering
no paramétrico de documentos por consenso. / The biases of individual algorithms for non-parametric document clustering can lead to non-optimal solutions. Ensemble clustering methods may overcome this limitation, but have not been applied to document collections. This paper presents a comparison of strategies for non-parametric document ensemble clustering.Peer ReviewedPostprint (published version
Unsupervised document clustering by weighted combination
This report proposes a novel unsupervised document clustering approach based on weighted combination of individual clusterings. Two non-weighted combination methods are adapted to work in a weighted fashion: a graph based method and a probability based one. The performance of the weighted approach is evaluated on real-world collections, and compared to that of individual clustering and non-weighted combination. The results of this evaluation confirm that graph based weighted combination consistently outperforms the other approaches.Postprint (published version
ParTes. Test suite for parsing evaluation
This paper presents ParTes, the first test suite in Spanish and Catalan for parsing qualitative evaluation. This resource is a hierarchical test suite of the representative syntactic structure and argument order phenomena. ParTes proposes a simplification of the qualitative evaluation by contributing to the automatization of this task. © 2014 Sociedad Española para el Procesamiento del Lenguaje Natural.Postprint (published version
TALP-UPC at TREC 2005: Experiments using voting scheme among three heterogeneous QA systems
This paper describes the experiments of the TALP-UPC group for factoid and ’other’ (definitional) questions at TREC 2005 Main Question Answering (QA)task. Our current approach for factoid questions is based on a voting scheme among three QA systems: TALP-QA (our previous QA system), Sibyl
(a new QA system developed at DAMA-UPC and
TALP-UPC), and Aranea (a web-based data-driven approach). For defitional questions, we used two different systems: the TALP-QA Definitional system and LCSUM (a Summarization-based system).
Our results for factoid questions indicate that the voting strategy improves the accuracy from 7.5% to 17.1%. While these numbers are low (due to technical problems in the Answer Extraction phase of TALP-QA system) they indicate that voting is a succesful
approach for performance boosting of QA systems.
The answer to definitional questions is produced by selecting phrases using set of patterns associated with definitions. Its results are 17.2% of F-score in the best
configuration of TALP-QA Definitional system.Postprint (published version
The TALP participation at TAC-KBP 2012
This document describes the work performed by the Universitat Politècnica de Catalunya (UPC) in its first participation at TAC-KBP 2012 in both the Entity Linking and the Slot Filling tasks.Peer ReviewedPostprint (author’s final draft
Unsupervised ensemble minority clustering
Cluster a alysis lies at the core of most unsupervised learning tasks. However, the majority of clustering algorithms depend on the all-in assumption, in which all objects belong to some cluster, and perform poorly on minority clustering tasks, in which a small fraction of signal data stands against a majority of noise.
The approaches proposed so far for minority clustering are supervised: they require the
number and distribution of the foreground and background clusters. In supervised learning and all-in clustering, combination methods have been successfully applied to obtain distribution-free learners, even from the output of weak individual algorithms.
In this report, we present a novel ensemble minority clustering algorithm, Ewocs, suitable for weak clustering combination, and provide a theoretical proof of its properties under a loose set of constraints. The validity of the assumptions used in the proof is empirically assessed using a collection of synthetic datasets
Unsupervised document clustering by weighted combination
This report proposes a novel unsupervised document clustering approach based on weighted combination of individual clusterings. Two non-weighted combination methods are adapted to work in a weighted fashion: a graph based method and a probability based one. The performance of the weighted approach is evaluated on real-world collections, and compared to that of individual clustering and non-weighted combination. The results of this evaluation confirm that graph based weighted combination consistently outperforms the other approaches
Discounted functionals of Markov processes
SIGLEAvailable from British Library Document Supply Centre-DSC:D063340 / BLDSC - British Library Document Supply CentreGBUnited Kingdo
Non-parametric document clustering by ensemble methods
Los sesgos de los algoritmos individuales para clustering no paramétrico
de documentos pueden conducir a soluciones no óptimas. Los métodos de consenso
podrĂan compensar esta limitaciĂłn, pero no han sido probados sobre colecciones de
documentos. Este artĂculo presenta una comparaciĂłn de estrategias para clustering
no paramétrico de documentos por consenso. / The biases of individual algorithms for non-parametric document clustering can lead to non-optimal solutions. Ensemble clustering methods may overcome this limitation, but have not been applied to document collections. This paper presents a comparison of strategies for non-parametric document ensemble clustering.Peer Reviewe